arXiv: 1504.06817vl [cs.LG] 26 Apr 2015 


Analysis of Nuclear Norm Regularization for 
Full-rank Matrix Completion 


Lijun Zhang zhanglj@lamda.nju.edu.cn 

National Key Laboratory for Novel Software Technology 
Nanjing University, Nanjing 210023, China 

Tianbao Yang tianbao-yang@uiowa.edu 

Department of Computer Science 

the University of Iowa, Iowa City, lA 52242, USA 

Rong Jin rongjin@gse.msu.edu 

Department of Computer Science and Engineering 
Michigan State University, East Lansing, MI 48824, USA 

Zhi-Hua Zhou ZHOUZH@LAMDA.NJU.EDU.GN 

National Key Laboratory for Novel Software Technology 
Nanjing University, Nanjing 210023, China 


Abstract 

In this paper, we provide a theoretical analysis of the nuclear-norm regularized least squares 
for full-rank matrix completion. Although similar formulations have been examined by 
previous studies, their results are unsatisfactory because only additive upper bounds are 
provided. Under the assumption that the top eigenspaces of the target matrix are incoher¬ 
ent, we derive a relative upper bound for recovering the best low-rank approximation of 
the unknown matrix. Our relative upper bound is tighter than previous additive bounds 
of other methods if the mass of the target matrix is concentrated on its top eigenspaces, 
and also implies perfect recovery if it is low-rank. The analysis is built upon the optimality 
condition of the regularized formulation and existing guarantees for low-rank matrix com¬ 
pletion. To the best of our knowledge, this is first time such a relative bound is proved for 
the regularized formulation of matrix completion. 

Keywords: matrix completion, nuclear norm regularization, least squares, low-rank, 

full-rank 

1. Introduction 

Matrix completion is concerned with the problem of recovering an unknown matrix from 
a small fraction of its entries (Candes and Tao, 2010). Recently, the problem of low-rank 
matrix completion has received significant interest due to theoretical advances (Candes and 
Recht, 2009; Keshavan et ah, 2010a), as well as its applicability to a wide field of real 
problems, including collaborative filtering (Goldberg et ah, 1992), sensor networks (Biswas 
et ah, 2006), computer vision (Cabral et ah, 2011), and machine learning (Jalali et ah, 
2011 ). 

Let A be an unknown matrix of size m x n, and without loss of generality, we assume 
m < n. The information available about A is a sampled set of entries Aij, {i,j) G hi, where 
hi is a subset of the complete set of entries [m] x [n]. Our goal is to recover A as precisely 
as possible. In a seminal work, Candes and Recht (2009) assume A is low-rank, and pose 
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the following nuclear norm minimization problem 


min ||i?||* , , 

s.t. Bij = Aij gQ. 

Under the incoherence condition, Candes and Recht (2009) show that with a high proba¬ 
bility the solution to (1) yields exact reconstruction of A, provided that a sufficiently large 
number of entries are observed randomly. Although the optimization problem in (1) is 
convex and can be formulated as a semi-definite program (Fazel et ah, 2001), it is compu¬ 
tationally expensive due to the large polynomial dependence on m and n. In the case of 
full-rank matrix completion, a similar nuclear norm minimization problem has been pro¬ 
posed. Suppose A = Z + N, where Z is the low-rank matrix that we want to recover, and 
N is the residual matrix. Candes and Plan (2010) introduce the following problem 

min ||hl||* 

where 6 is an upper bound for formulation in (2) also has strong 

guarantees if S is large enough, but optimization is still a challenge. 

On the other hand, practitioners prefer to solve the following nuclear-norm regularized 
least squares problem 


min 

BsRmxn 


(i,j)€n 


(3) 


for which many efficient optimization methods have been designed (Ji and Ye, 2009; Toh and 
Yun, 2010; Pong et ah, 2010; Hsieh and Olsen, 2014). Suppose we use hrst-order algorithms 
to optimize the above problems. Due to the non-smoothness of the objective function, the 
convergence rates for (1) and (2) are 0{VT), where T is the number of iterations (Nesterov, 
2004). On the other hand, the convergence rate for (3) is 0(l/r^) (Nesterov, 2013) or even 
linear under certain weak assumptions (Hou et ah, 2013). Although (3) is computation- 
friendly, its recovery guarantee remains unclear. One may argue that (2) and (3) are 
equivalent by setting S and A appropriately, but the exact correspondence between them is 
unknown in general. We note that a similar phenomenon also happens in compressive sens¬ 
ing. The hi-norm minimization problem has solid theoretical guarantees (Candes and Tao, 
2005; Candes, 2008), but the hi-regularized least squares is more efficient in practise (Xiao 
and Zhang, 2012). 

To bridge the gap between practise and theory, we investigate the recovery performance 
of (3) theoretically. We are interested in the general case that A could be full-rank and 
develop theoretical guarantees for recovering the best rank-r approximation of A, denoted 
by Ar- In particular, we would like to measure the recovery error in terms of a relative 
upper bound. Let R* be the optimal solution to (3). A relative upper bound takes the 
following form 

||R* — ArWp < U{r,m,n,\Q\)\\A — ArWr 
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where [/(•) is certain function of r, m, n and |0|. ^ Note that such kind of bounds is 
very popular in compressive sensing (Cohen et ah, 2009) and low-rank matrix approxima¬ 
tion (Boutsidis et ah, 2009). 

Similar to previous studies, we assume the top eigenspaces of A satisfy the classical inco¬ 
herence condition. Based on the celebrated results from low-rank matrix completion (Recht, 
2011), we derive an upper bound for ||R* — which induces a relative upper bound 

under favored conditions. We summarize the advantages of our results below. 

• We present a general theorem that allows us to bound the recovery error of (3) for 
any A > 0. In contrast, Candes and Plan (2010) only analyze the performance of (2) 
when 5 > 


• By choosing A appropriately, we obtain a relative upper bound of — 

ylrllF)- Although similar formulations has been studied by Koltchinskii et ah (2011) 
and Negahban and Wainwright (2012), their bounds are additive. To the best of 
our knowledge, this is the first relative error bound for the nuclear-norm regularized 
formulation. 

• Our relative upper bound for (3) is tighter than that for (2) developed by Candes 
and Plan (2010), and more general than those proved by Keshavan et ah (2010b) and 
Eriksson et ah (2012) under different conditions. 

• Compared to the additive upper bounds of other methods (Keshavan et ah, 2010b; 

Foygel and Srebro, 2011; Koltchinskii et ah, 2011), our relative upper bound is tighter 
when IIA — is small. Notice that additive bounds never vanish even when A is 

low-rank. 

Notations For a matrix X, we use ||A||*, ||A||ir, ||A|| and ||A||oo to denote its nuclear 
norm, Frobenius norm, spectral norm, and the absolute value of the largest element in 
magnitude, respectively, 


2. Related Work 

In this section, we provide a brief review of existing work. 

2.1 Low-rank Matrix Completion 

The mathematical study of matrix completion began with (Candes and Recht, 2009). Specif¬ 
ically, Candes and Recht (2009) have proved that if A obeys the incoherence condition, 
|n| > log(n) is sufficient to ensure the convex problem in (1) succeeds with a high 

probability, where C is some constant that does not depends on r, m, and n. The lower 
bound for the size of Q is subsequently improved to nr log® (n) under a stronger assump¬ 
tion (Candes and Tao, 2010). The results presented in (Candes and Recht, 2009; Candes 
and Tao, 2010) are without question great breakthroughs, but their proof techniques are 
highly involved. In (Gross, 2011; Recht, 2011), the authors present a very elegant approach 
for analyzing (1), and give slightly better bounds. For example, Recht (2011) improves the 
bound for |fl| to rnlog^(n) and requires the weakest assumptions on A. The simplifica- 

1. By the triangle inequality, a relative upper bound for recovering Ar directly implies a relative upper 
bound for A. 
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tion of the analysis also leads to better understanding of matrix completion, and lays the 
foundations of the study in this paper. 

In an alternative line of work, Keshavan et al. (2010a) study matrix completion using 
a combination of spectral techniques and manifold optimization. The proposed algorithm, 
which is named OPTSPACE, also achieves exact recovery if |fl| > Cnrmax(log(n), r). 
However, the constant C in their bound depends on many factors of A such as the as¬ 
pect ratio and the condition number. After the pioneering work mentioned above, various 
algorithms and theories of matrix completion have been developed, including distributed 
matrix completion (Mackey et ah, 2011), matrix completion with side information (Xu et ah, 
2013), 1-bit matrix completion (Cai and Zhou, 2013), coherent matrix completion (Chen 
et ah, 2014), and universal matrix completion (Bhojanapalli and Jain, 2014), to name a few 
amongst many. 

2.2 Pull-rank Matrix Completion 

We note that existing studies for full-rank matrix completion differ significantly in the 
assumptions they make, so their theoretical guarantees are not directly comparable. In 
the following, we will state existing results in the most general form, and (if possible) 
characterize their behaviors with respect to m, n, r, and |H|. 

Denote the optimal solution of (2) by B. Under the assumption 5 > 

Theorem 7 of Candes and Plan (2010) shows 

Let Z = Ar, N = A — Ar, and consider the optimal choice that 5 = 0 • 

The above bound becomes 



An investigation of OPTSPACE (Keshavan et ah, 2010a) for full-rank matrix completion 
is done by Keshavan et ah (2010b). In particular, Theorem 1.1 of Keshavan et ah (2010b) 
implies the following additive upper bound 

o(||.4J»»‘/V/‘yr+!=^||iV||) (5) 

where N is some matrix that depends on A —Ar and D. We note that it is possible to derive 
a relative upper bound from Theorem 1.2 of Keshavan et ah (2010b), but it requires very 
strong assumptions about the coherence, the aspect ratio (n/m), the condition number of Ar 
and the r-th singular value of A. Thus, the bound derived from Theorem 1.2 of (Keshavan 
et ah, 2010b) is much more restrictive than the bound proved here. 

Foygel and Srebro (2011) study the problem of matrix completion from the view point 
of supervised learning. The optimization problem is formulated as least squares minimiza¬ 
tion subject to nuclear-norm or max-norm constraints. Their theoretical results follow 
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from generic generalization guarantees based on the Rademacher complexity. Specifically, 
Theorem 6 of Foygel and Srebro (2011) implies the following additive upper bound 


O 


^||yl — + n 



+ \/?r 11A — 11 p 



( 6 ) 


where logarithmic factors are ignored. 

Koltchinskii et ah (2011) have investigated a general trace regression model, which 
contains matrix completion as a special case. For matrix completion, they propose the 
following optimization problem 


min 




|n| 


|fl| 




k=l 


+ A||i?| 


Let B be the optimal solution to the above problem. Under certain conditions, it has been 
proved that with a high probability (Koltchinskii et ah, 2011, Corollary 2) 


B-A\\l + \\B-X\\l<\\X-A\\l + 


Cmv? log(n)rank(X) 

N 


(7) 


for all X G However, due to the presence of the second term in the upper bound, it 

is impossible to obtain a relative error bound. 

In a recent work, Eriksson et al. (2012) consider a high-rank matrix completion problem 
in which the columns of A belong to a union of multiple low-rank subspaces. Under certain 
assumptions about the coherence as well as the geometrical arrangement of subspaces and 
the distribution of the columns in the subspaces, they develop a multi-step algorithm that 
is able to recover each column of A with a high probability, as long as 0{rnlog^{m)) entries 
of A are observed uniformly at random. However, the recovery guarantee of their algorithm 
for general full-rank matrices is unclear. 

Finally, we note that a similar formulation of (3) has been studied in (Negahban and 
Wainwright, 2012), which differs from our paper in the following aspects. 

• Negahban and Wainwright (2012) add a .^oo-norm constraint to (3) and thus their 
optimization problem is a bit more difficult. 

• Their analysis relies on the restricted strong convexity assumption, while our analysis 
assumes the incoherence condition. 

• They derive a additive upper bound. In contrast we are able to prove a relative upper 
bound. 


3. Our Results 

We first describe the theoretical guarantees and then provide some discussions. 

3.1 Theoretical Guarantees 

Let U = [ui,...,u^] and V = [vi,...,v^] be two matrices that contain the first r left 
and right singular vectors of matrix A, respectively. Let e^ and ej be the i-th. and j- 
th standard basis in and MA, respectively. Following the previous studies in matrix 
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completion (Candes and Recht, 2009; Recht, 2011), we define the coherence measure /xq as 


/ II n Il2 ^ II n I 

(Un = max — max Rr/eJ max iVe,- 
' r l<i<m r i<i<n 


where Pjj = UU and Py = VV are two projection operators. We also define fj-i as 


/ii = max 

i&[m],j&[n] 


mn 

r 


[UV^' 




Define two projection operators Vt and Vrp± for matrices as 

PriZ) = PuZ + ZPv - PuZPv, and pT±iZ) = (I - Pu)Z{I - Py). 

We assume the indices are sampled uniformly with replacement, and thus D is a collection 
that may contain duplicate indices. The linear operator TZq : jg defined as 


7^o(Z) = ^ (ejeJ,Z)ejeJ. 


To simplify the notation, we define 


e = \\A — ArWr- 

Based on the optimality condition of and the guarantees from low-rank matrix com¬ 
pletion (Recht, 2011), we obtain the following theorem. 

Theorem 1 Assume 


|D| > max ^114max(/xo,jUi)r(m-Hn)/31og^(2n), (8) 

for some j3 > 1, and n > 5. With a probability at least 1 — 6 log(n)(m -|- _ 

we have 




^ 8|D|e^ ^ 3mnrlog(2n)A 


mnX 


|D| 


WPTiAr - B,)\\f < 4e ■ 


2mn\ 


Y^3r log(2n) -|- 64 log(n).| 


I mnft 

6|D| 


\\Vt^{BMf. 


As can be seen, our upper bound is valid for any A > 0. In contrast, the upper bound for 
(2) in (Candes and Plan, 2010) is limited to the case 5 > \JlJZci{A — Ar, A — Ar)). 

By choosing A to minimize the upper bounds in the above theorem, we obtain the 
following corollary. 

Corollary 2 Under the condition in Theorem 1. Set 


A 


2|D|e / 2 

mn y 3rlog(2n) 
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With a probability at least 1 — 61og(n)(m + — n? — n ^, we have 


and thus 


\\VT±iB^)\\p < 4v^6rlog(2n)e, 
\\VT{Ar-B^)\\F < flO + 256 


mnr log^(2n)/3 


|^ 2 | 




11 Aj. — 11 pi < O 


V^r log(n) + 


I mnr log^(n) 


| 0 | 


e. 


3.2 Discussions 

Comparisons Let’s consider the most practical scenario \Q\ < mn, and for simplicity 
ignore logarithmic factors. In this case, our relative bound becomes 


The most comparable result is the relative upper bound derived by Candes and Plan (2010), 
because they also rely on the incoherence condition. Using Lemma 2 derived later in this 
paper to simplify (4) in Section 2.2, we obtain the following relative upper bound 

\\Ar — B\\f < O {^\/rn\\A — = O (\/me) 


which is always worse than our bound since |fl| > Cnr for some constant C . Compared to 
the relative upper bounds of Keshavan et al. (2010b) and Eriksson et al. (2012), our result 
is more general. Because their bounds only hold for a very restricted class of matrices. 

Our relative bound is tighter than the additive bound in (5) derived by Keshavan et al. 
(2010b), if 


£<0 


n 


3/4 


m 


1/4 


and also tighter than the additive bound in (6) derived by Foygel and Srebro (2011), if 
£ < 0{y/n). Let X = A^, the additive bound in (7) derived by Koltchinskii et al. (2011) 
implies 


B-A, 


< £ 



which is worse than our bound if e < 0{y/n). 

Low-rank Case In the special case that A is a rank-r matrix, we should interpret 
Corollary 2 as describing the limiting behavior of (3). It implies the recovery error will 
approach 0 as A —)• 0. That is, we can set A to be an arbitrarily small constant and the 
recovery error is also arbitrarily small. 

Assumptions The only assumption that we make is about the cardinality of in 
(8). The first lower bound is essential since it is the necessary condition for us to utilize the 
theoretical guarantees developed for low-rank matrix completion (Recht, 2011). In contrast. 
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the second lower bound is not obligatory. It is just used to facilitate a simple application of 
Bernstein’s inequality in Lemma 2. Without the second lower bound, we still get a relative 
upper bound, but at a larger order. For details, please refer to Section 4.4. 

Extensions Although the current result is built upon the result in (Recht, 2011), which 
in turn requires the incoherence assumption, it can be easily extended to support other as¬ 
sumptions for matrix completion. What we need is to replace Theorem 3 in Section 4.1 with 
the corresponding theorem derived under other assumptions. We will still get a relative up¬ 
per bound, possibly at different order. For example, if we use the theorems in (Bhojanapalli 
and Jain, 2014), our bound becomes a universal guarantee for full-matrix completion. We 
leave the extension of our analysis to other assumptions as a future work. 


4. Analysis 

We present the proof of Theorem 1 in this section. 


4.1 Sketch of the Proof 


As we mentioned before, our analysis is built upon the existing theoretical guarantees for 
low-rank matrix completion, which is summarized below (Recht, 2011). 

Theorem 3 Suppose 


|fl| > 32m8iK{fiQ, ij,‘l)r{m + n)(3log^{2n) (9) 

for some (3 > 1. Then, with a probability at least 1 — 61og(n)(m -|- the 

following statements are true: 


mn 

N 


T’tT^q.'Pt — Vt 


< 


ll^nll < -\/^log(n). 

There exists aYG in the range ofTZn such that 

wl 


Vt{Y) - UV ' 
1 



, r 

< W—. 


2n 


P 


T^iY) 


~ 2 ’ 




( 10 ) 

( 11 ) 

( 12 ) 

(13) 

(14) 


for all A e 

The first part of above theorem contains concentration inequalities for the random linear 
operator VtT^giPt and TZq,, and the second part describes some important properties of a 
special matrix Y, which is used as an (approximate) dual certificate of (1). 

Next, we will examine the optimality of R* based on techniques from convex analysis, 
leading to the following theorem. 



Theorem 4 Let be the optimal solution to (3), we have 

\{B, - Ar,UV^) + \\\Vt^{B^)\\^ < {nniB,-A),Ar-B,). (15) 

Based on Theorems 3 and 4, we are ready to prove the main results. However, the 
analysis is a bit lengthy, so we split it into two parts, and will first show the following 
intermediate theorem. 


Theorem 5 Under the condition in Theorem 1. With a probability at least 1 —61og(n)(m + 
^^ 2 - 2/3 _ ^2-2/31/2 _ 


2 (Ar — B^ ), Ar — ) + — 11 Vrpr (B^ 


L<A 



2n 


\\VT{Ar-B,)\\F+r (16) 


where 


^ _ 2|H|e^ 3mnr log(2n)A^ 
mn 8|H| 


(17) 


Before going to the detail, we introduce a lemma that will be used throughout the 
analysis. Since H may contain duplicate indices, {TZ^{A),A) / ||7?.n(^)|||’ in general. We 
use the following lemma to take care of this issue. 


Lemma 1 


(7^^(H),H) < ||7^o(H)||^, (18) 

|(7^o(H),H)| < (19) 


for all A,B£R^^^. 

4.2 Proof of Theorem 3 

Except for the last inequality in (14), all the others can be found directly from the proof of 
Theorem 2 in (Recht, 2011). Thus, we only provide the derivation of (14). Following the 
construction in (Recht, 2011), we partition H into p partitions of size q. By assumption, we 
can choose 

128 

q > -max(^O) Pi)r{m + n)/3log(m + n) 

3 

such that 

^ 3 

p = — = - log 2n. 
q 4 

Let denote the set of indices corresponding to the j-th partition. We define Wq = 

UV^, 

^ i=i 

Wk=UV^ -VT{Yk), 
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for k = 1,... ,q. Then, we set Y = Yp. It has been proved that with a probability at least 
1 — 61og(n)(m + 


WWuWf < 


( 20 ) 


-— Vt 

q 


~ 2 ’ 


( 21 ) 


for A: = 1,..., ( 7 . 

Since Wj = VT{Wj), we have 


Then, 


— {Wn,{H'j),I10) = ( 

21) Q Q 


TTLTi 

IT,-, -—VTnn.VTiWj] 


( 20 ) . 
< — 4 -T 


IF ^ 


( 22 ) 


p 


|(T,A)| <—Y^\{nn,{Wj.,),A)\ 


i=i 


i=i 


<- 


mn 



i=i 


\ i=i ' ' 



( 22 ) 

< 


< 




^^{Tla{A],A) 


■E^ 

i=i 


4.3 Proof of Theorem 4 

Since -B* is the optimal solution to (3), we have 

(7?.o(B^, — yl) + AB, — B^,) > 0 (23) 

where B G cI||B*||* is certain subgradient of || • ||* evaluated at B*. Let B G cl||yl,.||* be any 
subgradient of || • ||* evaluated at A^. From the property of convexity, we have 

{B^ - Ar,E - F) >0. (24) 

From (23) and (24), we get 

(Bo(B* - A) + AB, yl^ - B*) > 0. (25) 


10 



Next, we consider bounding X{F,Ar — B^). From previous studies (Candes and Recht, 
2009), we know that the set of subgradients of ||^r||* takes the following form; 

all All* = + W :W€ U^W = 0, WV = 0, ||hF|| < l| . 

Thus, we can choose 

F = UV^ + r^±(N), 

where N = argmax||j 5 ('||<^('Pji±(R*),X). Then, it is easy to verify that 

(B,-Ar,F}=(B,-Ar,UV^) + (B,-Ar,pT^(N)) ^ ^ 

-r (26) 

=(B,-Ar,UV^} + llrr±(B,)ll,. 

We complete the proof by combining (25) and (26). 


4.4 Proof of Theorem 5 

We first introduce a lemma that will be used later. 
Lemma 2 Suppose 


for some /? > 1. Then, with a probability at least 1 — n~^, we have 


(27) 


\/ {PniA — Ar), A — Af) < (28) 

V mn 

If the condition in (27) does not hold, we can use the following inequality 

;- (18) (11) 8 ^ 

V{PniA — Ar),A — Ar) < ||7^o(yl — A)||f < -^^/P^og{n)e 

and will obtain a similar bound but at a larger order. 

We continue the proof by lower bounding (R* — Ar, UV~^) in (15) of Theorem 4. To this 
end, we need the matrix Y given in Theorem 3. 

(R* - Ar, UV^) =(R* - Ar, UV^ -Y) + (R* - Ar,Y) 

=(R* - Ar, UV^ - Vt{Y)) + {Ar - R*,Rr^(y)) + (R* - Ar,Y). 

Next, we bound the last three terms by utilizing the conclusions in Theorem 3. 

(R* - Ar, UV^ - Vt{Y)) = {Vt{B, - Ar), UV^ - Vt{Y)) 

(12) r~T~ 

>-\\rT{B,,-Ar)\\F\\UV^-VT{Y)\\F > -J — \\VT{Ar-B,)\\F. 


{Ar - B„VTr{Y)) = {VTr{Ar - B,),VTr{Y)) 

= {VFr{-B,),VFr{Y)) > -1| ^^R,) ||, || ^^y) || _i | ||^ . 
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(14) 

{B,-Ar,Y) > 




Putting the above inequalities together, we have 

(B. - A„UV^) > - ^f^ll-PHA, - B.)||f - i ||PrUB.)ll. 

Substituting (29) into (15) and rearranging, we get 

{TZniAr - B,),Ar - B,) + ^ rr^(5*)IL 

<{TZn{Ar — A),Ar — B^) + 

, /Smnr log(2n) r —— -—-—:- 

+ AW--\/ {B,Q{Ar — B^),Ar — -B*). 

Furthermore, we have 

{TZQ^Ar — A), Ar — -B*) 

(19) _ _ 

< \/ {TZq{A — Aj.), A — A^) yj(JZ(^{Ar — B*), Ar — B*) 

(28) /2IJ7I ,_ 

< £\ -\/ {'B.n{Ar — B*), Ar — B*) 

V mn 

where Lemma 2 is used in the last inequality. 

From (30) and (31), we have 

'B.n{Ar — B*), Ar — B*) + — ||B2’^(-®*)II* 

7^'||'^r(^r ~ -S*)I|f + —-\/{^^^(Ar — B^), Ar — B*) 

V 2n V mn 

+ Ay^ "’""y"V(7;»(A,.-B.)..4,-B.> . 

We complete the proof by using the basic inequality 

-a^ — a/3 + /3^ > 0 

4 

with a = (JZn{Ar - B^),Ar - B*) and ^ G {e\/W’ 


(29) 


(30) 


(31) 
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4.5 Proof of Theorem 1 

We first explain the two lower bounds of |n| in (8). The first one is due to Theorem 3, but 
we use a larger constant (114 instead of 32) to ensure 


81og(n) rmP 1 
3 V W “ 4 


(32) 


which is used later. The second one is due to Lemma 2. 

4.5.1 Upper Bound for \\Vj^±iB^)\\p 

We upper bound ||PT(^r — -6 *)IIf (16) of Theorem 5 by 


( 10 ) 


mn 


WVriAr — B^)\\p = {VriAr — B^,),Ar — B^) < —-B*),—-B*) 


Plugging the above inequality in (16), we have 


1 


A 


^{TZniAr - B*), Ar - B*) + ^ \\Vt^ (B,)^ < A^j^ v'(B^7^oBT(Al. - B*), A, - B,) + T. 

(33) 

Since Vt + 'Pp± = I, we have 


-{TZn{Ar — B^),Ar — B*) 

{BT'BoPTiAr — B^),Aj. — i?*) +— {Bp±TZqBp±{B^), B^ 

2 ^ ^ z ^ 


1=02 


;=A 2 


- {B-niBriAr - B^)),Vp±{B^)) 

^>^-02 + - 0A = -(0 - A)2. 

“ 2 2 2 ^ 


Substituting (34) into (33), we have 


(34) 


1 


,2 ^ Ml , . iTm, 


5 (e-A) 0 -r.UB.)||.<AJ-e + r. 


Combining with the fact 


1_ .,2 X Irm^ . Irm . rm\^ ^ ^ 
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we have 


which implies 




rm}? 


(18) 
< A 


|f^| 


2\n\ 


+ r 


V7T) \‘^ 



(^)8Alog(n) jrmP 

^ V w 


< 


\\'Prp±{B:t)\\F + 


rm\^ 

2\n\ 


+ r 


/D Ml rmA^ ^ 

< 4ll^T4^*)llF + ^ + r 

/oMi , 2|5^k^ , 3mnrlog(2n)A2 

< j||Pm(-B.)I|. + +-^n]- 


||P^,(B.)||_< W^ + 3"™’'1o8(2")> 


mnX 


| 0 | 


4.5.2 Upper Bound for ||)Pr(A -B*)||ir 
Similar to (34), we have 

-{TlQ{Ar — B^),Ar — B^) 

(19)1 1 _ 

> -{VrTlnVTiAr — B^),Ar — B*) + -A^ — yj{VT'TlnPTiAr — B*), — B*)A 

where 


_ (18) (11) 8 

A = \/ {Vf^B-qVfi- (.B*), B*) < ||7?.QP'j^x(i3*)||i? < — ^/^los(’^)ll"^T-L(-®*)l|F• 

By plugging the above inequalities into (16), we have 
J-^||P'r(A — B^:)\\p + -A^ + — ||'Pr^(B*)||^ 

<Ay^||)PT(A, - B,)||^ + r + 81 og(n)yff ||)Prx(B,)||^||)PT(A - i?*)||F 

and thus 


WVriAr - B,)\\j, < 


2 (^) 2 mA\/^ 2 , 3rm2n2A2 log( 2 n) 


|U| 

+ 32 log(n 


\\VT{Ar — B^)\\f + Se + 


2 |0|^ 


' mn/3 

6 | 0 | 


||)Pr^(B*)||i.||)PT(A-5*)||F. 


(35) 
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Recall that 


x‘^ < bx + c ^ X < 2b + \/2c. 


Thus, we have 




Am\y/2rn , mnX ,—;—;—- , , , 

I F < --h 4e + V 3?’ log(2n) + 64 log(n) 




|^ 2 | 


I mn/3 
6 | 0 | 


(B* 


\F- 


We complete the proof by noticing 

4\/^ < n-\/31og(2n), Vn > 5. 


4.6 Proof of Lemma 1 

Denote the number of unique indices in D be u, and let 0 = {(afc)&fc)}fc=i be a set that 
contains all the unique indices in D. Let denote the times that {ak,bk) appears in D. 
Then, we have 


{TZn{A),A) — “ \\'^^iA)\\p. 

k=l k=l 


To show (19), we have 




E 

(ij)eo 


4 • R- • 



B2 = y^{TZn{A),A)y/{TZn{B),B) 


(ij)eo 


where the second line follows from Cauchy-Schwarz inequality. 

4.7 Proof of Lemma 2 

For each index (a^, bk) G D, we define a random variable 

* mn 

Then, it is easy to verify that 


E[^fc] =0, 


141 = 


,A — Ar)‘^ — 11^ — krill’ 


mn 


<mce^({ea^el A-Ar)‘^,—\\A-Ar\\]A < \\A-A^ 
\ * mn J 


2 

r 11 oo; 


EK^]=E (e,,e^^,Gl-Gl,) 


m^n^ 


||yl — Rr IIf 


<— 5;[.4 - A,]A < 2-||A - A,||^||A - Arfp. 

mn mn 

*4 
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From Bernstein’s inequality, we have 


^ “ -^r) ^ 2-^—^11^ “ ^r||F 
mn 


= P 


■|o| 

k=l 


> A 

mn 


2 

■rllF 


< exp — 


3\n\ 


8 ||A — ArWI^ mn 


IIA-A 


rllF ^ 


(27) 

< n 


-13 


5. Conclusion and Future Work 

In this paper, we provide a theoretical analysis of the nuclear-norm regularized formulation 
in (3) for matrix completion. Assuming the top eigenspaces are incoherent, a relative upper 
bound is derived. An extensive comparisons demonstrate that our bound is tighter than 
previous results under favored conditions. 

In certain real-world scenarios, we may further assume the observations are corrupted by 
noise. While Theorem 1 only addresses the noise-free case, we can immediately extend our 
results to the noisy case. Let A" be the matrix of noise. We just need to add {TIq{N), Ar—B^^) 
to the right hand side of (15), and upper bound it by N) ^iJZ^{Ar — B^), Aj. — B^) 

Then, we redefine T to include {TZfi{N), N), and the rest of the proof is almost the same. 

One limitation of the current analysis is that the optimal value of A depends on ||A — 
Aj-p, which is usually unknown. The same problem is also suffered by the nuclear norm 
minimization problem in (2). We will investigate how to estimate A in the future. 
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